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Abstract 

Communication systems are usually designed by assuming perfect channel state information (CSI). 
However, in many practical scenarios, only a noisy estimate of the channel is available, which may 
strongly differ from the true channel. This imperfect CSI scenario is addressed by introducing the notion 
of estimation-induced outage (EIO) capacity. We derive a single-letter characterization of the maximal 
EIO rate and prove an associated coding theorem and its strong converse for discrete memoryless 
channels (DMCs). The transmitter and the receiver rely on the channel estimate and the statistics of the 
estimate to construct codes that guarantee reliable communication with a certain outage probability. This 
ensures that in the non-outage case the transmission meets the target rate with small error probability, 
irrespective of the quality of the channel estimate. Applications of the EIO capacity to a single-antenna 
(non-ergodic) Ricean fading channel are considered. The EIO capacity for this case is compared to 
the EIO rates of a communication system in which the receiver decodes by using a mismatched ML 
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decoder. The effects of rate-limited feedback to provide the transmitter with quantized CSI are also 
investigated. 

Index Terms 

Outage capacity, channel uncertainty, channel estimation, ML decoding, coding theorem, mem- 
oryless channel, quality-of-service, compound channel, mismatch decoding, fading channel, AWGN 
channels, capacity formula, channel coding, Rician channels, side information, quasistatic Ricean fading 
channel, quantized CSI, feedback. 

I. Introduction 

Channel uncertainty caused by time variations/fading, interference, or channel estimation 
errors, can severely impair the performance of wireless systems. Even if the channel is quasi- 
static and the interference is small, uncertainty induced by imperfect channel state information 
(CSI) at the transmitter remains. As a consequence, studying the limits of reliable information 
rates in these scenarios is an important problem. Obviously, this requires some precise definition 
of the communication model and what one means by "reliability" in the situations of interest. 

In selecting a probabilistic model for a wireless communication scenario where the channel 
parameters are time-varying, several factors must be considered. These include the physical 
and statistical nature of the channel disturbances (e.g. fading distribution, channel estimation 
method, practical design constraints, etc.), the information available to the transmitter and/or to 
the receiver and the presence of any feedback link [1]. Assume that a specific instance of a 
discrete memoryless state-dependent channel (DMC) with discrete input x e discrete state 
s e y and discrete output y e ^ is characterized by a set of conditional probability distributions 
(PDs) We ^ {We : X y \ — > ^Iggg,. parameterized by the vector 9^9, where is a 
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set of parameters (not necessarily finite). The transition PD of the ri-memoryless extension with 
inputs X = (xi, . . . , Xn), channel states {6, s) = (^^, si, . . . , s„) and outputs y = (j/i, . . . , is 
given by 

n 

W^{y\^,s) = l[Wg{y,\x„s,), (1) 

i=l 

where 9 is assumed to be fixed during the communication with PD jie, but Sj varies from letter to 
letter drawn independent identically distributed (i.i.d.) from fis\e- This channel model is suitable 
for many wireless communication scenarios. The channel is said non-ergodic if the conditional 
PD does not depend on the state s and ergodic if the conditional PD and s do not depend on 6. 
Otherwise the channel model ([T]) might have fixed and time-varying channel states or parameters 
(6', {si]°^i) during the communication. 

Existing approaches dealing with channel uncertainty mainly correspond to two scenarios. The 
first scenario might be characterized by two facts: (i) the transmitter and the receiver are designed 
without the knowledge of the law governing the channel variations jies and (ii) the receiver may 
only dispose of {9, (fij^i), i.e., noisy estimates of {6, {sj}^]^), while the transmitter is provided 
with {0,{vi}°^^,9,{si]°^i). A reasonable approach in this case consists in using mismatched 
decoders [2]-[5] where the decoding rule is restricted to be a metric of interest, which is not 
necessarily matched to the channel governing the communication. A second scenario arises when 
the transmitter and the receiver are both assumed to be aware of the laws governing the channel 
variations. Let us assume first the case where the channel is non-ergodic {s is not present), and the 
transmitter is aware of the true channel states but not the receiver. In this case, universal decoders 
[6] can still achieve the capacity, attaining the same performance as the maximum-likelihood 
(ML) decoder tuned to the true channel. Loosely speaking, an universal decoder for a parametric 
family of channels is a decoder independent of the specific channel in use, that nevertheless 
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perfomis asymptotically as well as the ML decoder tuned to the true channel. Many families of 
channels admit universal decoders (see [6], [7] and [8], and references therein). Finally, we may 
also identify an intermediate scenario, where the transmitter only knows noisy channel estimates. 
Caire and Shamai [9] have studied the capacity of ergodic channels with imperfect CSI at the 
transmitter (CSIT) and/or at the receiver (CSIR), providing optimal power allocation strategies. 
Whereas Lapidoth and Shamai [10] studied the robustness of Gaussian codebooks and scaled 
nearest neighbor decoding over a flat-fading channel with respect to inaccuracies in the CSI, 
characterizing the performance loss that results from channel estimation errors. Additional results 
obtained by Lapidoth and Moser [11] show that for non-coherent channels (absence of CSI) the 
asymptotic MIMO capacity increases doubly-logarithmically with the SNR but with a reduced 
slope. This line of work was initiated by Marzetta and Hochwald [12], and then explored by 
Zheng and Tse [13], to study the non-coherent capacity under a block-fading assumption. 

A. Motivation 

The results recalled above correspond to communication scenarios where the laws governing 
the parameters of the channel are supposed unknown, or, whenever imperfect CSI is assumed 
at either the transmitter and/or the receiver, the channels are assumed ergodic (9 is not present). 
However, in many practical wireless systems operating over fading channels, the ergodic 
assumption is not necessarily satisfied since some of the channel parameters may be almost 
constant for a period of time, so that its randomness can not be averaged out (or removed) 
over time. This effect is even stronger when delay constraints are tight [14]. In such systems, a 
channel estimate is required for each period of time, and probably the most common method for 
channel estimation is the use of a training sequence. The transmitter sends a known sequence of 
symbols, allowing the receiver to estimate the channel state and send it back to the transmitter via 
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a noisy feedback link. Once the channel state has been estimated, the receiver decodes the rest 
of the transmission. Methods for estimating the channel parameters online or refining the initial 
estimate have been proposed, but we do not consider them in this work. In our setting, once the 
initial estimate has been obtained in the training phase, it is assumed fixed. This corresponds 
to how most actual standards work. Note that training introduces a throughput (and power) 
penalty since it requires frequent retransmission of the training symbols carrying no information. 
Hence, to reduce this undesired effect, it may be preferable to sacrifice channel estimation 
accuracy for small training overhead. Nevertheless, the quality-of-service (QoS) constraints must 
be guaranteed for each communication. This paper intends to provide insights regarding this 
tradeoff. 

In the non-ergodic scenario, most communication system aim to guarantee with high 
probability reliable communication (small error probability) at the target rate, no matter which 
channel estimate arises during the communication. To this end, the system designer will use 
the available CSI to appropriately allocate the available resources, e.g. power for transmission, 
the amount of training used, etc. This scenario can be mathematically modeled as depicted 
in Fig. [U A specific instance of the channel is described by a conditional PD s)} 
where it is assumed that neither the transmitter nor the receiver know exactly the channel states 
(6' G 6*, s G S^). The channel state 9 is randomly drawn from 6 jjLe before the communication 
starts and remains unchanged throughout the transmission, while the channel state s is assumed 
to change from letter to letter drawn i.i.d. from jJis\e- We assume that (s,M,f) is an i.i.d. 
sequence over x Y with joint PD nsuv\e, where the transmitter is providecu with 

'For simplicity, we assume that 6 is available at the transmitter and at the receiver. Generalization to the case where the 
estimate at the transmitter would be different (due to non perfect feedback) is straightforward. 

June 9, 2009 To appear in IEEE Transactions on Information Theory 



6 



m G Mn 


Encoder 




We[y\x,s) 


Y 


Decoder 


m E Mn 


0, Uo,) \ 














1 
I 
1 


{So.. 


^ lie 







Fig. 1. Block diagram of tlie channel with time- varying and fixed states and imperfect CSIT and CSIR. 

noisy (maybe poor) estimates {9,{ui}°l{) of {0, {si}°Z^), while the receiver knows estimates 
The encoder and the decode are both assumed to be aware of the laws governing 
the channel variations (j^esuvi'^e)- We remark that even in non-ergodic scenarios, where the 
channel does not have a fixed state 9, the model depicted in Fig. [Dean be useful to exhibit channel 
uncertainty arising on the statistic controlling the time-varying states s. Moreover in this setting, 
additional information is available from the accurate statistic, which consists of the conditional 
PD fJ-esuvie ^^^^ derived from the estimation method and the PDs (fiesuvj^e)- This 

information can be used to measure, in terms of probabilities, how accurate the state estimates 
are (e.g. to compute its variance, any confidence interval, etc.). However, reliable communication 
cannot always be guaranteed, since extremely poor estimates (even if unlikely) are possible. We 
address this problem by introducing the notion of estimation-induced outage (EIO) capacity. 

B. Related Results 

We next recall further results that address closely related problems and we comment on their 
differences and similarities to our approach. 

Medard [15] derives capacity bounds for slow fading channel with additive white Gaussian 
noise (AWGN) and minimum-mean square error (MMSE) channel estimation. The bounds 
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depend on the variance of the channel estimation error regardless of the estimation method. 
Moreover, these results have been extended to ergodic fading channels in [16], [17]. More recent 
work by Yoo and Goldsmith [18] derives capacity lower bounds for MIMO fading channels. 
These capacity bounds, derived for special cases of Gaussian and MIMO channels, appear to 
be an instance of the general framework considered in the present work. This can be seen by 
assuming a non-ergodic channel model |M/6)(y|x)} (there are no time- varying states s), controlled 
by an unknown state 9 E 0, where an estimate 9 of 9 and its accuracy statistic fig^g are available 
at the transmitter and at the receiver. We can consider the reliability function defined as the 
average of the transmission error probability over all channel estimation errors (this will be 
discussed later in Section HTCI) . It can be shown [19] that this notion of reliable communication 
leads to the capacity of the composite channel: 

W[y\x,9)= [ W0{y\x)diii9\9), (2) 

which results from the average of the unknown channel over the accuracy statistic, i.e., over all 
possible states given the estimate 9. The maximal achievable rate (the capacity) with reliable 
function defined by the average error probability over all channel estimation errors is 

C{9)= sup IiXf,Y^\e = 9), (3) 

where I{Xf,YQ\6 = 9) is the mutual information evaluated for the composite channel ^ 
with input distribution Pg in the set of admissible input distributions ^r{^)- Expression 
([3]) represents the capacity for general DMCs with arbitrary estimation functions. For instance, 
the bounds for Gaussian inputs and MMSE channel estimation found in [15] and [18] can be 
derived as well from ([3]). However, note that Gaussian inputs are not optimal for maximizing this 
capacity and that only lower and upper bounds are known. The proof of ([3]) directly follows from 
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Shannon's coding theorem [20], since the resulting error probability function can be defined in 
terms of the composite channel [21]. Moreover, it was shown in [19] that the capacity in ([3]) 
can be achieved by a ML decoder matched to the composite channel. 

Consider now the case of ergodic models |VF(|/|a;, s)} controlled by an unknown sequence of 
states {si]°^i (there is no fixed and unknown state 6), where the transmitter and the receiver are 
provided with the sequences and {fij^^, respectively, and these sequences are drawn 

i.i.d. from the joint PD fisuv- The results by Salehi [22], based again on the average error 
probability, extend Shannon's result [23] by showing that the capacity in this case is 

C= sup I{T-Y\V), (4) 

where T G .^"^''ll is a random vector of length \\'^\\ with elements in ^ and is its PD. It is 
appropriate to mention here, that from the results of [9] the problem of imperfect CSI for ergodic 
channels with time-varying states is overcome by coding over expanded alphabets, where the 
estimates known at the receiver are considered as an additional output (y, v) and those known 
at the transmitter as an additional input (x, m). This simple argument stated in [9] shows that 
dH) follows again from Shannon's result [23], so that no proof is actually needed. 

The capacity notions as defined above, based on averaging the reliability function over all 
channel estimation errors, cannot guarantee reliable communication in non-ergodic scenarios, 
specifically when significant differences arise between the true state 9 and its estimate. In other 
words, the above notions consider a transmission successful if the "average" (over the ensemble 
of states 9 given the estimate) of the error probability is small. This is therefore not really 
compatible with the constraints that are usually employed in non-ergodic scenarios, in which 
one wants to characterize the capacity attained by all users who have access to the service. In 
contrast, the notion of EIO capacity that we shall propose is closely related to that of outage 
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capacity, originally introduced in [24]. It is also connected to the notion of e-capacity, first 
proved for a class of discrete stationary channels called regular decomposable (cf. [14], [25], 
[26]). In the standard scenario of slowly fading AWGN channels, where the receiver is provided 
with perfect CSI and there is not CSIT, this notion relies on the fact that there may be a non- 
negligible probability that the value of the actual transmitted rate exceeds the instantaneous 
mutual information and thus an outage event occurs. Hence the error probability does not decay 
when the block-length increases. The capacity versus outage is then defined as the maximum 
rate that can be supported with probability 1 — 7, where 7 is a prescribed outage probability 
used to exclude the outage events. Indeed, it has been shown that the outage probability matches 
well the error probability of practical codes (cf. [27], [28]). The general results by Verdu and 
Han [29] provide a coding theorem for arbitrary channels in this setting. The e-capacity 
(0 < e = 7 < 1) is given by [29] 

C, = sup sup {R>0: FxiR, Px) < e} , (5) 
with the limit of cumulative distribution functions defined as follows 

where Wq and Px are the n-extensions of the channel and its input process. Note that the state 
6* in (|6l) is considered as an additional channel output [30]. Moreover, the general expression ^ 
holds also with imperfect CSIR by evaluating Q with the channel ©, averaged over all state 
estimation errors, and letting the state estimate 9 instead of 9 be an additional channel output. 

In non-ergodic scenarios, a transceiver using {9, {ui}°Zi, {vi}^^) instead of {9, {si}^^) 
obviously might not support a desired information rate, even arbitrarily small rates might not 
be supported if 9 and 9 happen to be strongly different. As a consequence of this observation, 
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"outages" induced by channel estimation errors will occur with a certain probability. In this 
paper, we introduce the notion of estimation-induced outage (EIO) capacity that is a function 
of the outage probability 7 (a QoS parameter), the specific channel measurement 9 and the 
joint accuracy statistic fJ-esuvie- ^ single-letter characterization, evaluating the optimal trade-off 
between the maximal EIO rate versus the outage probability, will be derived. 

C. Outline 

The remainder of this paper is organized as follows. In Section |Ill the notion of EIO capacity 
is first formalized for general DMCs and then a coding theorem is stated. Section UlI] presents 
the main steps of the proof of the coding theorem and its converse. An application example of a 
non-ergodic fading Ricean channel with ML channel estimation is considered in Section |IVl The 
mean EIO capacity is compared to the achievable EIO rates of a system using the mismatched 
ML decoder based on the state estimate. The effects of quantized feedback and power allocation 
strategies are also considered. Section |V] provides numerical results to illustrate mean EIO rates. 

II. Estimation-induced Outage Capacity and Coding Theorem 

In this section, we first develop a proper formalization of the notion of EIO capacity and state 
a coding theorem. 

A. Notation 

Throughout the next sections we use the following notation: ^(^) denotes the set of all 
atomic (or discrete) PDs on ^ with finite number of atoms. Then the n-th Cartesian power 
is defined as the sample space of X = (Xi, . . . ,X„), with Px-probability mass determined in 
terms of the n-th Cartesian power of Px- The joint PD corresponding to the input Px G 
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and the transition PD We e is denoted as WeoP^ G ^ ^) and its marginal 

on '3/ is denoted as WgPx G ^{'3^). The cardinality of the alphabets is denoted by || • ||, and 
the complement of any set £/ is denoted by £/^, while denotes the indicator function. The 
functional and H{-) denote the Kullback-Leibler divergence and the entropy, respectively. 

B. Problem Definition 

A message m from the set M„ = {1, . . . , [exp(?T,i?g)J } is transmitted using a length-n block 
code defined by a sequence of encoding functions y?^ = {v^gj : M„ x 6* x i-^ ^}'^_^ 
provided with states {9, ui, . . . ,Ui) & O x the receiver uses a sequence of decoding functions 
0^ = {0g . : W X O X ^ MnU {0}}"^^ provided with states {9,vi, . . . ,Vi) e O x y\ 
The maximum (over all messages) of the average (with respect to s, u, v, y) error probability, 
which depends on the unknown state 9, is defined as 

e^^lM^re\0,9) = ma^^ E E E E l{,.Hy,v)^.| ^^"(^^l^^^^' ^)/^"(^' ^1^' 

sG^" vG'^" yGS^" ^ 

(7) 

Each transmitted codeword must satisfy a transmission cost constraint (generalized power 

n 

constraint) Exu{^e(x, u)} < nF where <Pg(x, u) = ^<Pg(xj,'Uj) (F E IR+) for some cost 

i=l 

function : ^ x O x ^ \ — > R+. In the absence of a transmission cost constraint, we set 
F = oo. 

Definition 2.1: For a given estimate 9 and < e, 7 < 1, an EIO rate i?^ > is said (e, 7)- 
achievable on a DMC {W^ed/la^, s)}, if for every 5 > and sufficiently large n, there exists for 
the unknown state 9 a block code of length-n and size Mgg that supports an error probability 
([7]) smaller than e with prescribed outage probability: 

Pr (^{9 e ^''^ : n-^ logM^^ > Rg - 5]\e = 9^ >l - 7, (8) 
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where Ai"'^ = E 0: Cmax {^p^, 4>^\0,6) < e} is the set of all channel states allowing for reliable 
decoding. In other words, the encoder uses a smaller codebook that guarantees maximum error 
probabilities less than e with probability at least 1 — 7. 

A rate -Rg > is 7-achievable if it is (e, 7) -achievable for every < e < 1. Let C^]o '^he 
largest (e, 7) -achievable rate for an outage probability 7 and an estimate 9. The EIO capacity is 
then defined as the largest 7-achievable rate with e — > 0, 

We next compare the notion of reliable communication underlying the EIO capacity to the 
different reliability notions discussed in the introduction section. 

(i) The practical advantage of Definition 12.11 is that, for each transmission with an unknown 
but fixed draw of 9, the transmitter and the receiver are designed for guaranteeing maximum 
transmission rate for most of states, but the worst ones are considered as "outages" with 
probability 7. This provides more precise control over the reliability function ([7]) at the expense 
of decreasing the information rate. Notice that the conventional capacity [15] discussed in Section 
I, which only requires small "averaged" error probabilities 

e^^lMAW) = ^eie{e^±{y^s,re\0j)\0 = 0} < e, (9) 

does not guarantee small error probabilities for each transmission over a channel with an unknown 
state 9 (or non-ergodic component). In contrast, with EIO capacity the encoder and decoder 
determine the most likely set of states 9 using the available CSI (Wg, ^'QsuY\e^ ^) ^i^d construct 
codes that perform well simultaneously for all states (channels) in that set. Hence, similarly to 
the notion of e-capacity ([5]), this approach requires to eliminate the worst (unlikely but possible) 
states since these would yield zero capacity values. In contrast, similarly to the notion of ergodic 
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capacity [9], one can average the reliability function (|7]) over all channel estimation errors 
corresponding to the time-varying states {si}°Zi (o^ ergodic components). Related problems 
regarding the capacity of compound and average quantum channels were reported in [31]. 

(ii) In the conventional definition of capacity versus outage when the coding rate chosen 
is greater than the instantaneous mutual information an outage event occurs. For instance, the 
mutual information specifies the maximum rate with error-free communicatioro. This definition 
implicitly assumes that the state 9 is available at the receiver as an additional output and therefore 
cannot be directly extended in presence of 9. Indeed, notice that error-free communications cannot 
be guaranteed with this setting even for the best realizations of 9. Imperfect CSIR with no CSIT 
can be directly considered via the e-capacity. To this end, one can average the channel over 
all state estimation errors, which would yield the channel model Q with additional channel 
output 9, and then evaluate the general expression ([5]). In contrast, the EIO capacity allows 
for imperfect CSIT and roughly speaking, it is the maximal coding rate guaranteeing error-free 
communications for (1 — 7) percent of the states 9 given an estimate, according to the statistics 
of estimation errors. 

(iii) Assume that 9 is available at the receiver (an additional output) and for simplicity suppose 
that there are no time- varying states {U = V = S = %). When 9 is independent of 9, we observe 
that the functional ([8]) reduces to the conventional e-capacity ([5]) (except for a set of states 9 
with zero measure [29]). This can be easily seen by noting that in this case, A^J^^ = O and 
the rate of the code coincides with the (instantaneous) mutual information. Hence, expression 
^ becomes Pr [{9 e O : /(X; ¥0,6 = 9) > R - 6}) > 1 - 7 that equals © for memoryless 
sequences {X„}^]^. The 7-capacity follows by taking the supremum over all rates R >0. 

^Here, error-free communications is understood in the sense of arbitrarily small error probabilities in the limit. 
June 9, 2009 To appear in IEEE Transactions on Information Theory 



14 

C. Coding Theorem 

The next theorem quantifies the EIO capacity and provides an explicit way to evaluate the 
maximal EIO rate versus the outage probability 7 for an arbitrary DMC controlled by a random 
state sequence (6', {si}°li). The transmitter and the receiver are provided with the sequence of 
state estimates {6, {ui}°l^) and {6, {vi}°l^) and the joint statistic ^^QsuY\e^ respectively. 

Theorem 2.2 (EIO capacity): Given an outage probability < 7 < 1 and sequences of state 
estimates, the EIO capacity of an arbitrary DMC {We{y\x, s)} is given by 

Ceio{i,0)= sup sup ini l{T-Yg\Ve,d = e), (10) 

where the set of admissible input PDs is defined as 

=\qtx\u9 ^ X ^) '■ QTX\ue = '^{x=fT{e,u)}^T\e-' 

T^{X,S)^Yg\f 9eO, ^ = jrll^l', Exu{^ei^,U)} < 

with mappings {/f : x ^ 1 — . ^j^^^ and = {A C O : Fi{A\e = ^) > 1 - 7}. The 
supremum in (fTOl) is taken over all subsets ACQ that have (conditional) probability at least 
1 — 7 and the mutual information is given by 

/(T; Ye\Ve, ^ = ^) = E E E Priem^eiyMt, &) log ^^^f^, dD 

where 

Wie{yMtM = E Ew^(^'^l^'")iwM,«)}Mw|^,^) (12) 

and the equivalent channel with inputs (x, m, 9) and outputs (y, f , 9) is given by 

We(y, v\x, u,9) = J2 We{y\x, s)fi{s, v\u, 9, 9). (13) 
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Comments: (i) The expression of the EIO capacity in Theorem 12 . 21 provides a general formula 
for arbitrary state-dependent DMCs with imperfect CSI. 

Using basic information-theoretic considerations, it can be seen that the capacities and 
reliability measures discussed in the introduction are special cases of the EIO capacity. Therefore, 
the EIO capacity can be viewed as a unification of the results in [9], developed for ergodic 
channels with imperfect CSIT and CSIR, and the natural extension of the results in [29], 
originally derived for general channels with no CSIT. We mention at this point that (flOl) can 
be reached from the e-capacity ([5]) by letting the receiver known {6, 9, v) while the transmitter 
observes (6, u). Furthermore, if the transmitter has a noisy version of 6* (e.g. due to quantization 
and/or feedback errors) while the receiver is aware of (6', 0), Theorem 12.21 still holds with 6 in 
(fTOl ) replaced by 6. 

(ii) The goal of the encoder and the decoder in the EIO capacity is to determine the set of 
states yl* = {6' G 6*} that maximize the information rate over the averaged channel (fT2l) and 
simultaneously have sufficiently high probability given the state estimate 9. Then the encoder 
constructs codes that perform well simultaneously over all channel states 6 E A* . Hence, it 
should be noted that this approach yields a compound setting [14], [32] of the averaged channel 
investigated in [22] . This point of view can be complemented by the observation that compound 
channels play the role of the simplest models for situations where channel uncertainty arises in 
the non-ergodic components (the state 6) of the channel statistic controlling the communication. 
While averaged channels model scenarios in which the uncertainty is present in the ergodic 
components (the time-varying states s) of the channel. Furthermore, from (flOl) we can observe 
that the EIO capacity is not increased if the receiver is informed with the true channel state 6, 
but not the encoder. This observation coincides as well with the capacity results for conventional 
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compound channels [1], [21] and quantum compound channels [33], [34]. 

(ill) The proof of Theorem 12.21 is based on bounding the minimum size of the image of a code 
through a channel. The details of the proof and some associated technical aspects are relegated 
to Section UlI] and Appendix HH respectively. Although there exist alternative ways of proving 
this theorem, e.g. by using universal decoders (cf. [6], [7]), the present proof illustrates the 
connections between EIO capacity and the capacity of conventional compound channels and 
the manner in which the available CSI is exploited. Furthermore, it may perhaps provide useful 
insights regarding practical code design. The generalized Maximal Code Lemma used in our 
proof can be extended to more general models, as for example mismatched decoders. 

The generalization of the theorem and its proof to continuous alphabets is complicated by the 
fact that continuous-alphabet extensions of the concept of types (which is used in our proof) are 
not known [35]. Yet, there are several continuous-alphabet problems whose simplest (or only) 
solution relies upon the method of types via discrete approximations. The proof of Sanov's 
theorem in [36] and the capacity of Arbitrarily Varying Channels (AVC) with general alphabets 
and states have been determined in this way (cf. [37]). A possible route for a generalization of 
Theorem 12.21 to continuous alphabets is the use of the weak topology, requiring different tools 
from measure theory and consideration of locally compact Hausdorff spaces, e.g. alphabets like 
MJ' (or C*^) which are separable spaces. However, this extension is not considered in this paper. 

D. Impact of Channel Estimation Errors on the EIO Capacity 

We now present a general upper bound on the rate loss with respect to the perfect CSI 
scenario. To this end, we upper bound the rate difference between the EIO capacity (flOl) and the 
ergodic capacity with perfect CSI. Notice that we compare to the ergodic capacity because with 
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high-accuracy estimations fi^^Q becomes close to a Dirac distribution and thus the EIO capacity 
approaches (with probability 7 close to one) the ergodic capacity. The following Lemma easily 
follows as a consequence of Theorem |3.1[ stated and proved in Appendix |llll 

Lemma 2.3: Assume that the optimal set of channel states A* C obtained by maximizing 
the EIO capacity (fTOl) (over all sets of states ACQ having probability at least 1 — 7) defines 
a convex set of conditional PDs W^* = {W^ : ^ x O \ — > ^ x fjg^y^,. Let 9* e A* be the 
channel state that provides the infimum in expression (flOl) . The following inequality holds 

Ceio{7,0)<Ce{9)- [l)((y,,V^,)||(ye*,V^e*)l^,^') , (14) 

for any arbitrary state 9 E A* and the corresponding input T with PD qrpx\ue ^ that 
maximizes the expression of the EIO capacity in (flOl) . where Ce(6') denotes the ergodic capacity 

C^{9)= sup I{X;Ye\Se). 

Notice that the term in brackets on the right-hand side of the inequality (fT4l) is positive. Moreover, 
equality in (fT4l) holds for all linear families of conditional PDs (or channels) W^i*. 

III. Proof of the Coding Theorem and Its Converse 

In this section we approach the problem of determining optimal codes for achieving the EIO 
capacity, according to its definition in Section III-B[ The proof of Theorem 12.21 is based on a 
generalization of the Maximal Code Lemma [38] to bound the minimum size of the image of 
a code through the considered class of DMCs. Roughly speaking, the encoder by using the 
available information determines the most likely set of channel states and constructs codes that 
perform well for all states in this set. Decoding is based on the union of I-typical sets, which 
are called robust I-typical sets (see Appendix 
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A. Generalized Maximal Code Lemma 

This section uses the notion of information-typical (I-typical) and conditionally I-typical sets 
defined in terms of the KuUback-Leibler divergence: Tp.]^ = {t G =^7" : I'(P„||Pt) < 5} and 
Tpy^^^^l^j^(t) = {(y, v) G X : D(l?„||We|P„) < 5} (for further details see Appendix U). 
Furthermore, we define the set of states 

A, = [eee: min W^(^;;,|x, u, ^) > 1 - e, forallmGM„|, (15) 

such that Pr(yl,|^ = ^) > 1 - 7. 

Definition 3.1 (Admissible code): Given a state sequence {9, {si}°^^), the encoder and the de- 
coder are provided with the sequence of state estimates {9, {uij'^i) and (6', {vi}°Zi), respectively. 
The decoder reads 0g(y, v) = m iff m is the only message such that (y, v) G (^^ denotes the 
decoding set associated to message m [38]), while the encoder sends x = (p§{m, u) = ft^{9, u). 
For an arbitrary set % C n 7^r^^§^^ with P^^g{%) > r], < S,e,r],'y < 1 and an input PD 
{lTX\ue ^ ^r} with mappings {ft : O x 1 — > an admissible (n,e)-code has to 

satisfy the following requirements: 

(i) all codewords, depending only on the estimate 9, satisfy {'^m}.^^j^ C 

(ii) all decoding sets {^m}meM ^ (depending on 9) are mutually disjoint; 

(iii) for every m G M„, the decoding sets satisfy 

C ^^y^y^^rpQ^^{tm,9) = | ^ ( t „ , ^ ) . 

We now state a Fundamental Lemma analogue to Feinstein's Lemma [39] and its converse 
that bound the size of any code through the considered time-varying DMCs with imperfect CSI. 
The proof of Theorem 12.21 is immediate from this Lemma. 
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Lemma 3.2 (Fundamental Lemma): For every < e,6,r],T,'y < 1 and channel estimate 
9, every DMC {We : ^ x ^ x i — > '3^ x and admissible input PD {q^xiue ^ 
^r}, and every set % C 5^" satisfying P^^^(,%) > r], there exists a positive integer 
nodl^ll, ll^^ll, ||e5^||, ll^^ll, ||^||,e,5,?7) such that for all n>nQ the following statements hold. 

1) Direct part: There exists a set of states A* C and admissible (n, e) -codes with codeword 
set {tmj^gjyj C n 'J^^i^j and rate n^^logMg^, whose maximum error probability © is 
smaller than e for all G yl* and such that 

Fil^iOeA*: n-HogMgg> Rg-26}\e = e'^ > 1 - 7, (16) 

for all rates Rg < Ceio{.1, ^, Qtx\u9)- 

2) Converse part: For any admissible (n, e)-code of rate logMgg, whose maximum error 
probability is smaller than e for every 9 E A^, the largest code size satisfies 

Pr (^{6 G A, : n-HogM^^ > Rg + 26}\0 = 9^ < -f, (17) 

for all rates R§ > Ceio{i, ^> (lTx\ue)- 

The proof of Lemma 13.21 is obtained from basic properties of common 77-images and the 
concept of robust I-typical sets developed in Appendix |IIl 

Definition 3.3 (Common images of sets via channels): A set C 'S^"' x is a common 
//-image (0 < r/ < 1) of a set ef^o C via the collection of simultaneous DMCs jW^ : 
^ X I — > ^ X y}g^^ if Wf^{^\t,e) > 7] for al\e e A and every t G %. Thus, the set of 
all ?7-images is defined by 

^w.(=^o,^,^) = C X : inf W^(,^|t,^) > r], for all t G %\ . 

The minimum of the cardinalities of all common //-images <^ is denoted as 

gwJ'^o,^,^) = min{||<^|| : ^ C ^^^(5^0, ^, r^)}. (18) 
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Proof: It is not difficult to show the existence of a subset C ^ fl T" such that 
P^^-{3^q) > r]/2 for sufficiently large n. Thus we can search for a codeword set {'tm},^^j^ C 
e^g'. From Lemma 12.51 (Appendix ^ we know that by choosing a confidence set yl C 6* with 
Pr(yl|^ = ^) > 1 - 7 every robust I-typical set (t, ^) C x t"^ forms a robust e- 

decoding set (Definition 12.21) for each codeword t G Consider an admissible (n, e)-code that 
is maximal, which means that it cannot be extended by arbitrary {tM^^+i, S>'m^^+\) '^^'^^ ^'^^ 
extended code remains admissible. Define the set = I J with C . (1^,6'), 

and choose r < e such that 1 — e > e — r. It follows that 

inf I min W^(^"|x, u, > e - r, for all m G M„. (19) 

For any t G \ {ti, . . . , Ia/^.} and for all 6* G /l, if 

min W^(^" ^ \ ^"|x, u, ^) > 1 - e, 

then the code would have an admissible extension, contradicting our initial assumption. Hence, 
for all t G 5^' \ {ti, . . . , tAf^J, we have 



inf I max (5;" ^ \ ^"|x, u, ^) | < 1 - e. 

The above expression and (fT9l ) imply for large enough n that 

inf W^(^"|t, ^) > (e - t)\ for all t G 5^o' \ {ti, • • • , tA^, J- (20) 

Inequalities (fT9l) and (|20|) actually imply that is a common (e — r)^-image of the set !Jq via 
the collection of DMCs {W^ : ^ x 6* i — ^ ^ x 1^}^^^. By definition of gw^(«^o ' - ^f) 
it follows that 

ll^"ll>gw.(«^^o,^,(e-r)^). (21) 
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On the other hand, since C . (t„ 9) we have that 

IYaVa\TO\s^ 



m=l 

< M,^ exp [n( sup H{Ye, Ve\T^, e = 9) + T)], (22) 

for sufficiently large n and all 9 E A, where the last inequality follows by applying the upper 
bound of Theorem 12.61 Up to now by combining expressions (|2TI) and (l22l) . we have shown that 
there exists admissible (n, e)-codes such that 

n-MogM,^ > n-Moggw^(5'o',^, (e - rf) - snp H{Y0,Ve\T^,e = 9) - r, (23) 

for dX\ 9 E A and arbitrary set yl C 6* having probability at least 1 — 7. Let be the 
common (e — r)^-image of minimal size ||^"|| = g^^[^Q,9, (e — t)^). Then it can be seen 
that inf WeP" ^(^"■) > 'r]/2{t - r)^. By applying Lemma [TTT] (Appendix HI) to this relation and 
substituting it in (|23|) . we obtain 



n 



-1 



logM,^ > inf /(T,-; Ye, Ve\e = 9)- 2r, (24) 



for all 6* G /I and n > n'^, which follows by using the inequality n^^ logg,yy^ (^', ^, (e — r)^) > 
H(Y0, Ve\0 = 9) — T for all 9 E A. Finally, taking the supremum in (|24l) with respect to all sets 
A c O having probability at least 1 — 7 yields the lower bound (fT6l) 

n"^ logMgg^ > Ceio{i, ^, ^rxic/e) " 2r 

> i?^-2r, (25) 

for all Rq < CEio{l,(^,(lTX\ue) ^^'^ states 9 E A*, which is attained by setting A^ = A*. 
We now prove the second statement (converse part). For every 9 E A^ and set A^ c 9, let 
^ a^n ^ yn arbitrary set such that 



mm 



(#e"|x, u, ^) > e + r, for every m G (26) 
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Hence, it follows from Definition (1151) and expression (1261) that 

W^(^;^ n ^^\tm, e) > r\ for m G M„, (27) 

provided by P^^|^^(5^^^|^^j |tm) > r for sufficiently large n. Using Corollary 1.2.14 in [40], 
we hence obtain 

min 11^;^ n %\ > exp [n(H{Ye, Ve\Tg, 6 = §) - r)], (28) 

for each 9 E A^, provided n is sufficiently large. Observe that we can set = ^ ^^^^ (t„, 9), 
which satisfies (l26l) for sufficiently large n and every m E M^. From the disjointness of decoding 
sets I I of every admissible code, we have that 

m=l 

> M,gexp [n(i/(F,,r,|T^,^ = ^) - r)], (29) 
for all 6* G yle, where the last inequality follows from (|28l) . Thus, we have shown that 

logM,^ < n-' log - H{Ye, Ve\Tg, 6 = 9) + r, (30) 

for all 9 E A^. Notice that since by assumption % C T" , , it follows that C 7" , and 
thus Proposition ll.4l -(iv) (see Appendix IJ) shows that there exists n > Uq such that 

n-Hogm<H{Ye,Ve\0 = 9) + T. (31) 

Hence, by applying (|3T| ) to (|30l) and then taking its supremum with respect to all sets A C 
having probability at least 1 — 7, we obtain that for all 9 E A 

+ 2r, 

< Re + 25, (32) 
with i?g > Ceio{i, lTX\ue) ^''^^ Pr(^ ^ ^|^) < 7, which concludes the proof. ■ 
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IV. Application Example: EIO Capacity of the Non-Ergodic Rician Fading 

Channel 

In this section, we illustrate the results by evaluating the EIO capacity using Theorem 12.21 
We consider a simple but rich enough framework that assumes communication over a single- 
antenna wireless channel, involving a Rician block flat fading model, where the channel state 
6* G 6* is described by a single coefficient = H E C and it is assumed that there are no 
time- varying states {U = V = S = ^). The single channel state 9 is assumed to remain constant 
during the transmission of each codeword but it is unknown at the transmitter and the receiver. 
Each transmission is preceded by a short phase of channel training (which is small compared 
to the coherence time). This consists in sending a training sequence consisting of symbols, 
which are perfectly known at the receiver. Thus, the receiver is able to perform ML or MMSE 
estimation of h, yielding the noisy channel estimate 9 = h. 

In many wireless systems, CSI at the transmitter is provided by the receiver via a feedback 
link, allowing the transmitter to use adaptive modulation and coding and to perform power 
control. We will consider the following three scenarios. 

• No feedback channel is available (i.e. absence of CSIT). 

• A perfect feedback link is available (i.e. the transmitter knows the actual estimate 9). For 
this case, we compare the EIO capacity with the EIO rates achievable with a receiver that 
performs mismatched ML decoding based on 9. 

• A rate-limited feedback link is available, i.e., a quantized version 9 of the estimate 9 is sent 
to the transmitter. The quantization codebook (designed using the well-known Lloyd-Max 
algorithm [41]) is known at both the transmitter and the receiver. 
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A. Channel Model and Estimator Statistics 

Consider a single-antenna block fading channel for wireless environments, given by 

Y, = HXi + Z,, z = l,...,n, (33) 

where G C is the received discrete-time signal, Xj G C denotes the transmit signal, H = h E 
6* = C is the channel realization and G C is the additive noise. Each transmitted codeword 
X = (xi, . . . , Xn) must satisfy the power constraint Ex{||x||^} < riFg with power Fg. The noise 
Zi is i.i.d. zero-mean circularly complex Gaussian (ZMCCG) with variance To model Rician 
fading, the channel state = H is assumed to be circularly complex Gaussian with mean fin and 
variance ajj, i.e., H ~ CJiiiiH.cr'jj). The Rice factor is defined as Kh = — n— ■ The channel 
is a memoryless non-ergodic channel with conditional PD 

WH{y\x)=CJ^{Hx,al) 2.nd Y^e = {WH=h{y\x),he0]. (34) 

We are going to employ the EIO capacity expression provided by (fTOl) with the appropriate 
transmission constraint, even though we provided a proof only for discrete input and output 
alphabets (see the comments at the end of Section Hl-CI) . 

Since the channel coefficient H = h is constant within a frame, channel estimation can 
be performed on the basis of known training (pilot) symbols transmitted at the beginning 
of each frame. The transmitter, before sending the data x, sends a training sequence x^ = 
{xt,i, ■ ■ ■ , xt,n)- According to the observation model (|33] ). this sequence is affected by h, 
allowing the receiver to observe separately yT = hxx + zy, where zt is the noise affecting the 
transmission of training symbols. The average energy of the training symbols is Pt = ^^x^x^. 
Estimating h in the ML sense given yr and x^ amounts to minimizing ||yr — hxrW^ with 
respect to h. This yields h = yrx^(xT'X^) ^ = h + E,, where £ = zrx^(xrx^) ^ denotes the 
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N P 

estimation error. Next, we can write cr| = SNR^^ with SNR^ = — —■ Thus, the conditional 
pdf of 6 = H given 6 = H is. the circularly complex Gaussian pdf f^fj^fj = CJ^^H, o"|). Then, 
by using some algebra and the a priori distribution jin, the a posteriori distribution of H given 
H can be expressed as 

1^h\h{H\H) = /^^l^^^l^^^^^^^ = c^i^f,, Sal) , (35) 
J0 

where 6 = ^^^t^h — /^i^ = + {1 — 6)^h- An alternative expression of (1351) in terms 
of the phase (j)H and the magnitude r of H = rexp{j(f)H) is given by 

0^1^ = ^) = ^ exp ^-^ J , (36) 

where (p^^ denotes the phase of /i^. The availability of the accuracy statistic (l36l) characterizing 
the channel estimation errors is the key feature to compute the EIO capacity. 



B. EIO Capacity of the Non-ergodic Ricean Fading Channel 

Evaluating the EIO capacity (flOl ) requires to solve an optimization problem where we have to 
determine the optimum set ylopt C O, and its associated channel state /lopt € ylopt minimizing the 
mutual information (fTTI) . However, in our case it can be observed that the mutual information 
computed with (|34|) only depends on the absolute value \h\ of the channel coefficient. Thus, 
the optimization over sets yl C 6* of complex fading coefficients can be replaced with sets 
Ai = {h ^ O : \h\ E I}, where / denotes an arbitrary positive real interval. 

The conditional pdf fi^^^j can be obtained by marginalizing (|36l) , which results in the Ricean 
distribution 
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where Iq{-) is the zero'th order modified Bessel function of the first kind [42, Eq. (8.445)]. 
Consequently, given an e > 0, the optimization problem now reduces to finding the optimum 
interval /opt = [^opt, 1/e] such that the set A^^^ = {h e O : \h\ e 4pt^} has probability 1 - 7 
(computed with (|37l) ) when e — 0. This follows from the fact that the mutual information is 



monotone increasing in \h\ while the intervals l'^2 are convex and compact, thus the infimum 



'opt 

r(^) 



in the capacity expression actually equals the minimum over all r ranging over the set I^^^. 
It follows that Topi (7, /i) is the 7-percentile given by Pr [H G 4pt = /i) = 1 - 7. This 
probability can be computed from the pdf (l37l) as follows 



Pr [H e A^l\H = h)=Q^ 



5al ' 



2(rg(7,/^))" 

V y 



(38) 



where Qi(a,/?) is the first-order Marcum Q-function [43] (see Appendix HVl). We note that the 
mutual information corresponding to the considered channel is maximized by using ZMCCG 
inputs with variance (transmit power) P^. Then, the EIO capacity can be shown to be given by 



CEIoh, K Ph) = log2 I 1 + —2 



(39) 



by choosing T = X in (fTOl) . where Topi ropt with e — > 0. 

We remark that for fixed 5 > 0, Pr {^H — H\ > 5 \ H = nj ^ as ^ 00. Thus, any 
set As = {H G 6* : \H — h\ < 6} contains a smaller and smaller neighborhood of the true 

parameter H and hence by continuity Ceio ~* log2 "I ) ^ Therefore, we 

observe the expected result that the EIO capacity converges to the capacity with perfect CSI for 
all 7 G [0, 1], as the training sequence length N tends to infinity. 
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C. Capacity of the Non-ergodic Ricean Fading Channel Based on Average Error Probability 

For comparison, we also evaluate the capacity expression ^ corresponding to the conventional 
notion of reliable communication explained in Section I, which is based on the average of 
the error probability over all channel estimation errors. We begin by computing the composite 
channel model ((21), which contains the channel estimation errors 

WM^) = W.^^fj{WH{y\x)\H = h], 

= CJ4{{6h + {l-S)fiH)x,(Tl + Sal\x\^). (40) 

Then, it is not difficult to show that with Gaussian inputs the mutual information evaluated in 
this composite channel yields the following expression: 

CCKPh) = H{Y^\H = h)-H{Yj,\X^,H = h), 

= log, {\5h + (1 - 5)fXH\'PH + 4 + SalPj,) - Ep. { log2 (4 + Sal\x\') }, 



+ 



(41) 



where the last equality follows by calculating the expectation and -E'i(-) denotes the exponential 
integral function (see Appendix HVT). Note that the first term in (|4T1) provides an intuitive lower 
bound, i.e., 

which follows by upper bounding the second term in (pT)) using Jensen's inequality and the 
concavity of the log function. This lower bound parallels, for the considered estimation method, 
that found in (|42)) . 
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D. Achievable EIO Rates Associated to the Mismatched ML Decoder 

Mismatched decoding arises when the decoder is restricted to use a prescribed "metric" d : 
^ 'S^ M_,_, which does not necessarily match the true channel (cf. [44], [5], [10]). Given 
an output sequence y G and a channel estimate h, we assume that decoding is performed 

II ^ II 2 

by using the mismatched ML metric, i.e., we set (i^(xj,y) = [|y — /iXj|| . Hence, the decoder 
declares that the codeword G {xi, . . . , xa/} was sent, iff (i^(xj, y) < djj(xj, y) for all j ^ i, 
otherwise it declares an error. Obviously, suboptimal performance in terms of achievable EIO 
rates is expected for this decoder, since it does not necessarily achieve the EIO capacity. However, 
we aim at comparing the EIO capacity (|39l) (i.e. the ultimate limits) with the EIO rate Ceio-ml 
achievable with the mismatched ML decoder. 

The expression of achievable EIO rates associated to the mismatched ML decoder can be 
obtained by combining the notion of EIO capacity with the previous results [5] 

Ceio-ml (7, ^) = max sup inf l{Xj^]Y^^^\H = h), (43) 

Px<^^r ^ce: Pr{A\H=h)>l~f {HeA, (5,<T)eV(//,^)} 

where the set V{H,h) = {{^,a) G (0 x M+) : Ex-Y.^Adni^^y)} < Ex^y«{c?i/(a;, y)}, with 
Pxy^,a{y) = Px^niu) a.s.} and the mutual information is evaluated for an arbitrary channel 
V^,cr(-|a;) = C3\f(^x, cr^) with channel state ^ G 6* and variance cr^. Then, by computing the 
mutual information and the minimization set, it follows that 



CeIO-Ml(7,^) = inf, , log2 ( 1 + /, rn^^ \,-\-?.\r> , 2/' 

where the last equality follows by computing the minimizing value yUopt = ^^^^ with the 



{HeA,fieC:Re{(h}>Re{Hh}} \ ( | p - |^ 1 2)P^ + (t| 

mf logo 1 + ^ " ^ , (44) 



\h 



2 



definitions of H = rexp(j0//) and h = f exp^jcpjj). It should be noted that, in contrast to 
the EIO capacity, here the achievable EIO rates associated to the mismatched ML decoder 
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are sensitive to phase errors between the channel and its estimate. For real-valued channels, 
mismatched ML decoding entails no performance loss since (|44)) equals the capacity (|39l) . and 
thus the comparison with mismatched ML decoding would not make sense in that context. 

Evaluating (l43l) further requires an evaluation of the optimal set /lopt C 6* of channel states 
maximizing (l44l) and whose probability is at least 1 — 7. By inspecting expression (l44l) . it is not 
difficult to see that this set is characterized by ylopt = {H E O : — 0^|<0£, |-f^|> ropt} 
for some optimal values (ropt, 0£) guaranteeing that Pr(ylopt|-ff = /i) = 1 — 7 for a given 
channel estimate h. We now need to evaluate the probability Fi{Aopf\H = h), which consists in 
integrating the pdf (|36l) over the set /lopt of complex values, resulting in the following integral 
expression 

/(ropt,0£)=/ / Brexp{-Ar^ + Mrcosi(PH-(p,.^))drd^H, (45) 

with constants A = —5-, B = and D = — exp ( — ^ ) . This integral can be numerically 

OCTg CTg n \ 4A J 

evaluated (see Appendix HVT) and thus the rate expression ([43] ) writes as 



{(rop,,</.e):/('-opt,?i£)=l-7} V ' ^optSin^(0£)-Pff + O'z 



Ceio-ml(7, h) = min logs ( 1 + 2 2/ - ,^ 2 ) ' ('^^^ 



where the minimization is taken over all pairs (ropt,0£), defining the boundary of the region 
ylopt, for which Pr(ylopt|i^^ = /i) = 1 — 7. 

E. Long-Term Power Allocation and Quantized CSI Feedback 

Next we concentrate on deriving optimal power allocation strategies : M+} for 
maximizing the mean EIO capacity under the long-term constraint Ejj{Pj^} < P, for the cases 
of noiseless and noisy feedback. In this scenario, since each codeword experiences additive 
white Gaussian noise, random Gaussian codes with multiple codebooks are optimal. Based on 
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the channel estimate available at the transmitter = H (respectively its quantized value = H), 
a codeword is sent at a power level (respectively P^) given by the optimal power allocation 
function (cf. [9]). First consider the case of noiseless feedback (i.e. the transmitter knows 6 = H). 
From (|39l) the mean EIO capacity writes as 



C^,o(7, = Eh{ sup log, (l + \ 



(47) 



where the supremum is over all non-negative power allocation functions Pfj such that Ejj{P^} < 
P. Given a state measurement H = h, the transmitter selects a code with a power level 
and uses h and fi^^jj to compute (using (1381) ) the worst channel state r*(7, h). Thus, the optimal 
power allocation maximizing (l47l) is easily derived as the well-known water-filling solution [9], 



Ph 



(48) 



Tq r*{'y,h) 

where ro is the positive constant guaranteeing the power constraint ^uiPfj} = P and = 
max{x, 0}. 

Consider now the situation in which the receiver quantizes and sends to the transmitter the 
channel estimate H, by using a rate-limited feedback link. Clearly, the performance is now a 
function of the amount of feedback bits RpB- The receiver selects a quantized value among 
MpB = [2^^™ J possibilities in the quantization codebook, which is assumed to be also known 
at the transmitter. This codebook is designed to minimize the MMSE between the input and its 
quantized value. We construct this codebook Q [H~\ e {Hi, . . . , Hm^b } by using the non-uniform 
quantizer Q[-] designed with the well-known Lloyd-Max Algorithm [41]. We remark that the 
considered quantizer is not necessarily optimal for maximizing the EIO capacity. The reason is 
that the cost function (not necessary the MMSE) can exploit any channel invariance, which may 
be present in the communication model. Indeed, optimal design of quantized feedback is a vast 
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topic and the literature is large and growing (see [45]-[49] and references therein). However, 
here we do not intend to design optimal feedback, the goal is to show how to incorporate limited 
feedback in the EIO capacity. Then, to capitalize on the rate-limited feedback the EIO capacity 
and its power allocation (|48l) should be modified accordingly. 

Let H = Q be the quantized value received at the transmitter corresponding to a channel 
estimate H. In this case, the mean EIO capacity with rate-limited feedback is given by 

CEioil, P) = sup V log2 1 + ""^^^'^ ^' Pr(^ = k), (49) 

Ph-^MPh}<P i=l V ) 

where the supremum is over all non-negative power allocation functions : . . . , Hm^^ } — > 
M+ such that ^Ph^ Pt{H = hi) < P; here, Pr(^ = k) = / d^i^^f^{H\H = hi) where 
Aq^i = {/i G 6* : Q\ji\ = hi^ denotes the set of h yielding the quantized state hi. The optimal 
value ropt(7, hi) in (l49l) can be computed by following the same steps as in (|47l ) but according 
to the pdf given a quantized estimate h. It is immediate to see that the optimal power 
allocation must satisfy the power constraint with equality, and thus 



En, 



(50) 



/o ropt(7, 

where tq is a positive constant ensuring the power constraint Ejj{P^} = P. 

It remains to compute the accuracy statistic represented by the pdf of r = \h\ given 
H = h, which characterizes the channel estimation and the quantization errors together, regarding 
the quantization method under consideration. In order to derive the accuracy statistic /i^ i?^ needed 
to compute the EIO capacity, we introduce the statistical model for quantization of channel 
estimates. From the rate distortion theory [50] and by considering the MMSE distortion, it is 
not difficult to see that iifj^fj^j^ = CJi{h,a\^), where the variance corresponds to the 
quantization error of H, which is encoded with PpB bits per scalar symbol. According to this 
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and expression (|35l) . we can compute the pdf fJ.jj^fj = C!N(/ij^, 5(cr| + 5o"|^)) with 
5H + (1 — S)fiH- Hence, the pdf required to derive the EIO capacity is given by 



and its corresponding probability follows as 



(52) 



V. Numerical Results 

In this section, numerical results are presented based on the capacity expressions evaluated 
in Section |IVl which correspond to different scenarios of a single-antenna non-ergodic Ricean 
fading channel. 

We first assume communications without long-term power constraints (no power control is 
possible) and numerically evaluate: (i) the mean EIO with perfect feedback, i.e., H is available 
at both the transmitter and the receiver (expression (|39|)). (ii) the capacity corresponding to the 
conventional notion of reliability based on the average error probability (expression (ST])), (iii) the 
EIO capacity without CSIT (expression ([5])) and for comparison we also show the mean Shannon 
capacity with perfect CSI. Fig. |2(a)| shows these quantities (in bits per channel use) versus the 

signal-to-noise ratio SNR = ^ — for different outage probabilities 7 G {10^^, 10^^}. The 

Rice factor was = OdB, the power and the length of the training pilots are P-r = P and 

= 1, respectively. We observe that the mean EIO capacity is quite large, in spite of the small 
training sequence. To achieve 2 bits with imperfect CSI (7 = 0.01) requires about 5.5 dB more 
than in the case with perfect CSI. On the other hand, observe that the channel estimation errors 
are still quite large with a single pilot symbol (e.g. (t| = 1 for SNR=OdB) and therefore the 
notion of reliability based on the average error probability yields much higher rates that may be 
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not effectively supported in practical communication systems. It should be noted, however, that 
by choosing an outage probability 7 = 0.1 and at an SNR of 15 dB both notions of capacity 
lead to the same rate. This scenario has been outlined in the introduction section, exposing that 
the EIO capacity provides more precise control through 7 over the reliability function, at the 
expense of decreasing the information rate. 

Fig. |2(b)| compares the following capacities versus the SNR, for 7 = 10^^: (i) The mean 
EIO capacity with perfect feedback, (ii) the maximum EIO rate associated to the mismatched 
ML decoder (expression (1461)). for different amounts of training and the mean Shannon capacity 
with perfect CSI. Observe that in order to achieve 2 bits, a scheme using imperfect CSI and 

= 3 pilot symbols (dot line) requires 7dB, i.e., 4dB more than in the case with perfect CSI 
(solid line). If the number of pilot symbols is further reduced to = 1, this gap increases to 
5dB. In comparison, mean EIO rates C*eio-ml corresponding to the mismatched ML decoding 
are significantly smaller compared to the EIO capacity. Indeed, in order to achieve the same 
target rate, a communication system using the mismatched ML decoder would requires 2.5 
higher SNR. Thus, it follows that the accuracy of channel estimates provided by = {1,3} 
pilot symbols is not enough to allow for reliable decoding with the mismatched ML decoder. 
However, if the number of pilot symbols is increase to = 10 then this decoder can achieve 
rates close to the EIO capacity. 

We now consider communications with long-term power constraints, so that power allocation 
functions are employed. The following scenarios are investigated: (i) the mean EIO capacity with 
perfect feedback and optimal power allocation (expression (|47l)). (ii) the mean EIO capacity 
with rate-limited feedback and optimal power allocation (expression (|49l) ) and the ergodic 



capacity with perfect CSI. Fig. 2(c) shows the mean EIO capacity for 7 = 0.01 and rate-limited 
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feedback/CSIT versus the SNR. It is seen that the mean EIO rates increase with the amount 
of feedback bits. In the case of the ergodic capacity (perfect CSI with power allocation) the 
SNR requirement for 2 bits is 2 dB (solid line), while 3 dB are required for the mean Shannon 
capacity (no power allocation), 7dB for the mean EIO (imperfect CSI and power allocation) with 
perfect feedback (A^ = 1 and 7 = 0.01), and 9.5 dB without power allocation. Thus, in presence 
of a long-term power constraint the gap between the EIO capacity and the ergodic capacity is 
slightly smaller than without such constraint, i.e., 5dB using power allocation as opposed to 
6.5 dB without it. We observe that with rate-limited feedback larger gains can be obtained by 
increasing the SNR. The gap between the mean EIO capacity for 1 bit of rate-limited feedback 
and that for to 3 bits is 5dB (at a capacity of 2 bits), and this gap increases with the SNR. In 
particular, a scheme using Rfb = 3 bits of feedback achieves almost the same performance as 
perfect feedback. Therefore, using this information a system designer may decide the number 
of feedback bits required to achieve certain target rates. 

Finally, we study the impact of the imperfect CSI on the mean EIO capacity for different 
fading statistics, i.e., different Rice factors, and perfect feedback. Fig. |2(d)| shows the mean EIO 
capacity for Rice factors G { — 15,0,25}dB and amounts of training G {1,3}. We observe 
that increasing the Rice factor increases the impact of the estimation errors on the mean EIO 
capacity. For high value of = 25 dB (i.e. smaller variance ajj) the mean EIO capacity is not 
sensitive to the amount of training. In contrast, for the smaller Rice factor Kh = — 15dB it is 
more important to achieve accurate channel estimates. This observation can be understood from 
the notion of EIO capacity that depends on the trade-off between the estimation error cr| and 
the variance of the fading process ajj^ (see expression (l35l)). Such analysis could serve as a basis 
to decide in practical situations whether or not, depending on the nature of the fading process, 
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robust channel estimation is needed. The worst case is observed for the range of intermediate 
Rice factors (i.e. Kh — OdB) since for these values the uncertainty about the accuracy of 
estimates is maximal. 

VI. Summary and Discussion 

In this paper we investigated the problem of reliable transmission over discrete memoryless 
channels (DMCs) when the receiver and the transmitter only know noisy estimates of the time- 
varying states and fixed states controlling the communication. We proposed to characterize the 
information theoretical limits of such scenarios in terms of the novel notion of estimation-induced 
outage (EIO) capacity. In this setting, the goal of the transmitter and the receiver is to construct 
codes, based on accuracy statistics for the channel states, to guarantee the desired communication 
service (achieving target rates with small error probability) whatever the quality of the estimates 
during the transmission. We provided a single-letter characterization of the optimal trade-off 
between the maximum achievable EIO rate and the outage probability (the QoS), by proving an 
associated coding theorem and its strong converse. The EIO capacity can be viewed as unification 
of several useful capacity notions for memoryless channel models with uncertainty regarding the 
channel states. 

A non-ergodic Ricean fading model is used to illustrate the above results by computing its 
mean EIO capacity. These results are useful for a system designer to assess the amount of 
training and feedback required to achieve target rates over a DMC with a given channel statistic. 
The maximum achievable EIO rate with Gaussian codebooks of a naive system whose receiver 
uses the mismatched ML decoder based on the channel estimate was also studied. Our results 
indicate that if the channel estimates are not precise enough (e.g. the training phase is too short) 
then this decoder can be largely suboptimal for the considered scenario. An improved decoder 
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should use a metric based on maximum a posteriori (MAP) probability [19], [51], i.e. taking 
into account the statistical nature of the state estimation errors. Moreover, the study of practical 
coding schemes satisfying the outage constraints, which perform close to the theoretical optimum 
given by the EIO capacity, is also a topic of interest. 

Possible direct applications of these results arise in practical communication systems with 
small training overhead and QoS constraints, such as OFDM or some MIMO systems. Another 
application scenario arises in the context of cellular coverage, where the average of EIO capacity 
would characterize performance over multiple communication sessions of many users with 
different geographic locations [52]. In that scenario, the system designer must guarantee a QoS 
during the connection session, i.e., reliable communication for (1 — 7)-percent of users, even 
for users with poor channel estimates. As a more challenging problem, it would be interesting 
to extend the EIO capacity to multiuser channels (e.g. MIMO broadcast channel and MIMO 
multiple access channel) with imperfect CSI. 
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Appendix I 

Information-typical Sets and Basic Properties 

Information (or KuUback-Leibler) divergence of PDs can be interpreted as a (non-symmetric) 
analogue of Euclidean distance [53]. It entails the definition of I-typical sets, first suggested by 
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Csiszar and Narayan [54]. Several results for standard "strongly typical sets" can be extended 
to "information-typical sets" [35]. 

Throughout the appendices we use the following notation. The empirical PD -P„(x; •) e 
associated with a sample sequence x = {xi,...,Xn) G is P„(x;^/) = A^(^/|x)/n with 

n 

N{£/\y:) — ^ l^(xj), and W„(x, y;-|a) is the empirical conditional PD associated with x 

i=l 

and y = {yi,...,yn) G for a e ^. The set ^n{^) C ^(JT) denotes the set of 
all rational point probability masses on and its cardinality is bounded by \\^n{^)\\ < 
(1 + n)ll'*" [40]. We shall use the total variation or variational distance defined by V(P, Q) = 
2 sup \P{^)-Q{s^)\. Pinsker's inequality [40] for conditional PDs states that V{WoP^ VoP) < 
^JD{W\\V\P)/2. LetQ,Px e ^{^), then Q is said to be absolutely continuous with respect to 
Px, denoted Q <S Px, if Q{s^) — for every set ^ C =^ for which Px{^) — 0. The support 
of a conditional PD [W : ^ i — > ^} e with respect to the PD Px is defined as the set 

Sp{W) = {y : W{y\x) > for all Px{x) > O}. Let {W,V : ^ ^ — > ^} E ^{^) be 
two conditional PDs, then {W} is said to be absolutely continuous with respect to {V}, writes 
W <^V,ii Sp{W) C Sp{V). Thus, it follows that T){W\\V\Px) < oo iff < Let be 
a convex set of PDs {Wq : i — > ^j^e^ e there is one PD whose support contains 

all the others' supports which is called the support of the set and is denoted by Spiy^l a). 

Definition 1.1 (Set of types): For any PD P„ e the set of sequences x e with 

type Pn is defined by '^[p^] = {x e : 2)(P„||P„) = O}, where P„(x, •) is the empirical 
PD. Similarly, for a conditional PD W„( |a;) e <^„(^), the set of sequences y e with 
type Wn is defined by Tj^^^^jlx) = {y e : 'D{Wn\\Wn\Pn) = O} for each x e ^T" and 
iy„(x, y; 6|a)A^(a|x) = A^(a, 6|x, y) is the empirical conditional PD. 

Definition 1.2 (Set of I-typical sequences): For any PD Px G ^{^), the set of sequences 
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X e called I-typical with constant 5 > 0, is defined by Tp^^j^ = {x e : T){P^\\Px) < 
5}, where P„ is the empirical PD such that P„ <S Px- Similarly, for a conditional PD {l^ : 
^ I — > ^ j e ^(^?^), the set of sequences y e conditioned to x e called conditional 
I-typical with constant 5 > 0, is defined by Tpy|^]^(x) = {y e : D(1K||VI/|P„) < S}, where 
is the empirical conditional PD such that Wn -C W (respect to P„). 

Lemma 1.3 (Uniform continuity of the entropy function): Let P,Q& be two PDs and 

{W,V : ^ I — e ^{W) be two conditional PDs. Then, from Lemma 1.2.7 [40], 

(i) If v(p,g) <e< 1/2, ^ |i/(Xp) - e{Xq)\ < -0\og ^ 



(ii) If V{VoP,WoP) <0< 1/2, \H{Yv\Xp) - H{Yw\Xp)\ < -6>log ^ 



\m\\n 



Proposition 1.4: (Properties of I-typical sequences) 

(i) Any sequence x e 7^x]s in^pli^s V(P„, P) < ■\/S/2. Moreover any sequence y e Tp^|-5f]^(x) 
implies V(Pr„oP„, VFoP^) < y/^/^, for all x e 

(ii) There exists sequences ((5n)neN+ and in 1R+ (which only depend on \\^\\, \\^\\) 
such that andnlog~^(n+l)(5„, 5^) — >• oo as n — >• oo, so that for every Px G 

and {ly : =r I — ^ ^} e Pj(7]^]^J > 1 - e„ and |x) > 1 - with 

e„ = exp{ -n(5„-n~^||^||log(n + l))}, 

= exp{-n((5;-n-^||jr||||^||log(n + l))}. 



Note that log(n + 1) < ^/n and consequently these sequences converge to zero with a rate higher 
than that obtained for strongly typical sets [35]. 
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(iii) Given P,Q e ^{^) and {W,V : i — ^ ^} G ^(^) and 5 > 0. Then, we have that 



if 'D{Q\\P)<6 
if ViWWVlP) < S 



\H{Xq) - H{Xp)\ < -./6/2\og 



6/2 



\H{Yw\Xp)-HiYv\Xp)\ < - W21og 



v/572 



\^\\/\n' 

(iv) There exists sequences (e„)„gN+ and (e^)„eN+ in M+ with (e„, e^) — > 0, as well as in (ii), 

so that for every Px E ^{^) and {W : ^ i — ^ ^} G ^(^), we have that 
1 



^\og\\7^x,J-HiX) 
ilog||7f^l^]^, (x)||-i/(r|X) 



n 



< e' for each x G T?Vi • 



Proof: Assertion (i) immediately follows from Pinsker's inequality. Assertion (iii) follows 
from (i) and Lemma 11.31 Assertion (iv) immediately follows by defining I-typical sets using 

(5^) -sequences and from the claim (iii), where the existence of such sequences was proved 
in (ii). In order to prove the claim (ii), it is sufficient to show the second expression: 



W^"(^"\^y|x],, |x) 



{V„:DiV„\\W\P„)>5',^,V„<.W} 

< Yl exp[-n'D{Vn\\W\Pn 
{y„:D(y„||H/|p„)>5;,i/„«vi/} 

< (l + n)ll'^llll'^llexp(-n5;) =e', 



for each x G ■ 
Lemma 1.5 (Uniform continuity of I-divergences): (i) Given conditional PDs [W,Z,V : 
\ — ^ ^} G such that W,Z -^V with respect to some PD Px G ^(^). For 

each e > 0, if T^{Z\W\Px) < e there exists 5 > such that \D{Z\V\Px) -'^^\y\Px)\ < 5 

with 6 = -^/I/2\og (v^/dl^llll^f )) ^ as e ^ 0. 

(ii) Similarly, given PDs Q,Z,Px G ^{^) such that Q,Z < Px. For each e > 0, if 

D(Z||g) < e then there exists 6' > such that |D(Z||Px) - D{Q\\Px)\ < 5' with 5' = 

-V^log {^/I/2/\\^f) ^ as e ^ 0. 
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Proof: We only prove the first statement, since then (ii) follows immediately from (i). We 

observe that, from Proposition 1 1.41 (i) and Lemma [L3l D(Z||iy|Px) < e implies \H{Yz\Xp) — 

/ , r-^ J7/2 

H(Yy/\Xp)\ < — A/e/2 log y. . Thus, the poof follows by considering the inequalities 

\l){Z\\V\Px)-^{W\\V\Px)\ < \H{Yz\Xp)-H{Yw\Xp)\ 

+ $^$^Px(a)|^(6|a)-W^(6|a)|log||^r|| 

< -^/7/2log{y^/{\\^\\\\^\\)) + ^/7/2\og\\^\\ =6. 



Lemma 1.6 (Large probability of I-typical sets): Let 7]^x]s '^[y\x]s Ltypical and 

conditional Ltypical sets, respectively. The probability that a sequence does not belong to these 
sets converge to zero, 

lim P"(^^" \ Tp^j^ ) = 0, lim W''(^'' \ '7'^y\x], I^) = 0' every x e JT". 

Furthermore, T){Pn\\Px) and 2)(M/„| |iy |P„) ^ as n ^ oo with probability 1. 
Proof: We observe from assertion (ii) in Proposition 11.41 that 

W^"([^y|x].J1x) < exp [ - - n-'\\3^\\\\^\\ \og{n + 1))] , 

for every x G e^", and then it expression goes to zero as n ^ oo. The second assertion 

oo 

follows from the fact that ^ W{{y G : V{Wn\\W\Pn) > 5„}|x) < oo, and by applying 

n=l 

the Borel-Cantelli Lemma [55], we obtain Pr ( limsup {D(iy„||M^|P„) > (5ri}|xj = 0, which 

^ n— ►oo ' 

concludes the proof. ■ 
Lemma L7: Given < r/ < 1, Px G ^{^) and the set of conditional PDs [We : ^ i — > 
'^}eeA some set A C O. Then there exist sequences (e„)„gN+ and (e^)„gN+ in M+ with 
(e„,e^) 0, which only depend on \\^\\, ||^|| and rj, so that: 
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(i) if inf lygP^fi^) > r/ for ^ C then - log ||^/|| > sup HiYg) - t^, 

(ii) if inf W7(^|x) > for C and x e Tp^j^, then ^log > snp H{Ye\X) - e^,. 
This Lemma simply follows from the proof of Corollary 1.2.14 in [40] and the previous Lemmata. 



Appendix II 
Auxiliary Results 

This appendix introduces a few concepts shedding more light on the encoder and decoder 
required to achieve the EIO capacity and furthermore provides some auxiliary technical results 
needed for the proof of the Generalized Maximal Code Lemma 13.21 in Section III. 

Unfeasibility of Mismatched Typical Decoding: Consider a DMC and its (noisy) estimate 
{WgijWg : ^ I — > ^ X The following Lemma shows that an I-typical decoder based 
on {Wg} yields an error probability that approaches one when Wg ^ Wg. This reveals that 
conventional I-typical sets with respect to {W^} merely specify some local structure in a small 
neighborhood of {W^} but not in the whole space (as outlined in [56]). This fact does not 
establish that any I-typical decoder is not useful for decoding with imperfect CSI, but it shows 
that there are no decoding sets C T^^^^i^j^ (tj)} and codewords {tj} C Tj^j^ such that 
W^(^f |ti) > 1 - e for all n > no. 

Lemma 2.1: Let {W, V : 3^ \ — ^ ^ x r} be two channels such that D(W||V|Pt) > ^ > 
and W < V respect to an arbitrary PD Pt E ^{ST). Let (t) and (t) denote 

the corresponding associated conditional I-typical sets, for every t G Tp,j^ . Then, there exists 
an index no G N+ such that for all n > uq these sets are disjoint and thus W"(T|^^|j,j^ |t) — 
as n ^ oo. Furthermore, the quantity D{Wn\\Y\Pn) D(W||V|Pt) with W"-probability 1 as 
n ^ oo. 
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Proof: We must show that given arbitrary W,V, such that D(W||V|Pt) > ^ > 
with W < V, then for every sequences (y,v) G Tp^^^,^|^j^Jt) (i.e. D{WnW\Pn) < 5n with 
Wn < W) and each t e ^, there exists no = no(|| ||^||, ||^||, e N+ such that 
2)(R||V|P„) > 6n for all n > no, which implies that (y, v) G 7fy^y^^^^Jt) n [Tf^^^^,^]^^ (t)]^ 
To this end, we know from Lemma O that D(W^„||W|P„) < Sn implies |D(t?„||V|P„) - 
2)(W||V|Pt)| < with 5; = -^5;721og(y5j2/(||^||||^rf lirf)) < ^ provided for all 
n > no- We have also used the fact that t G 7^t]s ^^'^ '•^^^ 2)(P„||Pr) < yielding to 
|D(W||V|P„) - D(W||V|Pr)| < V26^\og\\^\\\\y\\ for all n > no. 

Hence, it follows that D(1?„||V|P„) > X)(W||V|Pt) -<5; > ^-5'^- For instance, for any ^ > 
there exits no G N+ such that for all n > no D(W^„||V|P„) > 5„ = ^—6'^ as n ^ 00), which 
implies that (y,v) G Tp^^^^|^]^Jt)n[Tp^^^^|^]^Jt)]^. Finally, since Tf^^^^l^j^Jt) C [Tp^^,.„|^j,J'= 
for every t G T^,]^ , we have from Lemma [L6] that W" ('^^[yyVVIT]^ \^) ^ as n 00, concluding 
the proof of the first claim. To prove the second assertion, from Lemma [T3] and the last assertion 
we can see that for every t] > there exits no G N+ and (5„)n6N+ such that the set e^^(t) = 
{(y, v) G ^^"x : |D(R||V|PO-D(W||V|Pt)| > V, « V} C [Tp^^^^i^j^Jt)]^ Hence, 

CO 

Pr [^^\t) < e„ with e„((5„) — > as n — >• cx), which means that ^ Pr (e^^(t)|t) converges for 
all > 0. The proof concludes by the Borel-Cantelli Lemma [55]. ■ 
We now construct a formal definition of the decoding sets used to achieve the EIO capacity. 

Definition 2.2 (Robust e-decoding sets): Let 3^o C denote a set of transmit sequences. A 
set C X is called a robust e-decoding set for a sequence t E ■% and an unknown 
DMC {W0:^x0i — ^^xr} with ^ G if the conditional (w.r.t. 9) probability of 9, for 
which the Wg-probability of ^ exceeds 1 -e, is at least 1 -7, i.e., Pr [{9 e O : Wg(=^|t, 9) > 
l-e}\e = 9) > 1-7. 
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Proposition 2.3: A set /I C 6* is called a confidence set for O if Pr(0 ^ = ^) < 7, where 
7 denotes the confidence level. If /I is a confidence set of level 7 and ^ is a common ry-image 
of the collection of DMCs {Wg : ^ x O 1 — > x y] g^^, then =^ is also a robust e-decoding 
set with e = 1 — 77. 

The statement follows from the fact that any conditional PD is ©-measurable and from basic 
properties of measurable functions (see [55, p. 185]). 

Definition 2.4 (Robust I-typical sets): Robust e-decoding sets can be implemented by intro- 
ducing the concept of robust I-typical sets. A robust I-typical set is defined as 

for an arbitrary subset A C and (5-sequence (5n)neN+- 

Lemma 2.5: Given < 7, e < 1, a necessary and sufficient condition for a robust I-typical 
set to be a robust e-decoding set is that A he a confidence set of level 7. 

lYAVA\Te\s„ 

Proof: We start proving the necessary part of this condition, namely Pi (^A\6 = 9) > 
1 — 7 (i.e. yl is a confidence set) implies Pr [A^\6 = ^) > 1 — 7 with A^ = {^9 E O : 
Wflfe^" |t,^) > 1 — e) (i.e. , is a robust e-decoding set). From Proposition 

|1.4| -(ii) it is easy to see that S/'Z is a common r7-image for the collection of DMCs W/i 

(with ?7 = 1 — e), and thus the proof follows as a consequence of Proposition 12.31 In order to 

prove the sufficiency condition, we need show that if S/'^, is a robust e-decoding set then 

\yAVA\Te\i,^ 

A must be a confidence set of level 7. Instead of this, we shall show the converse implication, 
namely Pr = 9)< 1— 7 (i.e. A is not a confidence set) implies that Pr ([/l^]^!^ = ^) > 1— 7 
where \A,X = {9^0: W^(^;j^^^|^-j^Jt, ^) < e} (i.e. ST^^^^^^-^^^^ is not a robust e-decoding 
set). Actually, from Lemma [2?T] we note that for all 9 E [yl]'^ fl 6* there exists E such that 
Wg(£^J^ ' |t, 9) < e provided by n > tiq. Consequently, the proof follows immediately by 
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noting that [A]^ n 6* C [A,Y and Pr [[AY r\ 0\e = 9) > I - ■ 
Theorem 2.6 (Cardinality of robust I-typical sets): Consider an arbitrary collection of condi- 
tional PDs (or channels) {Wg : x 6* i — ^ ^ x ^j^^^ together with its associated robust 
I-typical set ^;j,^,|,,,Jt) = U/;^,^,,,,Jt, for all t e Then, there exists an 

index rin E N+ such that for all n > uq the size of can be bounded as follows: 



< 



where i/(y^, Va\T, 6 = 6) = sup //(Yg, I/^IT, 6 = 6) and r/„ ^ as 5„ ^ and n ^ oo. 

The quantity H{Yy\, Va\T,6 = 6) may be interpreted as the conditional entropy of the set Wa 
and can be shown to equal the I-projection (cf. [53]) of the uniform PD on the set W^. Before 
proving this theorem we need the following result. 

Lemma 2.7: Consider an arbitrary set of DMCs = {Wg : 3^ x O i — > ^ ^] eeA 
together with its associated set of I-typical sequences (t) = 1!^^ ,^ , (t,^) and 

let the set ^]^](t) = |J 7fy„](t) with = W,i n ^„(^ x Y), for every t G Then, the 
size of e5^" is bounded as 



< \\^\\\\^\\\\y\\n-Hog(l + n) 



Furthermore, if the set Wa is convex then the upper bound can be replaced by 11 < 

[yAVAl-t (>is„ 

exp < n max iJ(V^|P„) \. 

The lower and upper bound for a non convex set can be easily proved, while for the a 
convex set this follows straightforward as a generalization from results found in [57]. 

Proof: Theorem \2.6\ We first show that the size of ^ is asymptotically equal to 
the size of (t) = |J (t), where T = n x r) is the intersection of with 

the set !^n{'3/' x "V) of empirical distributions induced by sequences of length n. In particular, 
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there exists an index uq such that for all n > uq and t G TJ 



(t)|| < (l + n)ll^llll"^llll^'ll||,^5](t 



The lower bound in (|53] ) is obvious. Let us assume that there exists a sequence of (e„)„gN+ 
that for all n> uq and each t G 5^", 



(53) 



such 



(54) 



from which the upper bound in (l53l) follows as 



[Y0Ve\Te]s„ 



9eA 



(a) 



< Ell^v„u(t) 



< (l + n)ll'^llll^lll|-^ll||^^](t 



(55) 



and each t G we have T[;^,^](t) n Tp^^](t 



where (a) follows from (l54l) and the union bound, (b) follows from (t)|| < (1 + 

^)!|.^!ll|J«'lll|-^il||^n^^(t)|| and the fact that for every Vn{-\t),Vn{-\t) G E with V;(-|t) ^ K(-|t) 

{0}- 

Now we turn to prove statement (|54l ). First of all, note that W^i is a relatively rg-open subset 
of Wyi U ^ri(^^ X y) in the ro-topolog)o [36], i.e., every W G Wyi has a ro-neighborhood such 
that the e-open ball Uq{W, ^,e) C W^i. Hence, given an arbitrary sequence {en)nGN+ the En- 
open ball satisfies Uo{W, ^,en) H x Y) c Wa with large enough n. As a consequence, 
there exists an index 7io G N+ such that for all n > no, we can choose the sequence e'^ = 

vC^log(v^/(||=^||ll^f lirf))] and pick a conditional PD Vn{-\t) G 

W{y,v\t) 



UoiW, ^,e„) n ^ni^ X y) with V;(-|t) > such that W{y,v\t)\og- 



< e' for 



Vn{yMt) 

all (y, t>) G X 7^ and -Pn(t) > 0. On the other hand, observe that any sequence (y,v) G 

'The ro-topology on ^(^x^) si defined by the basic neighborhoods Uo{W, = £ ^(^x^) : |W(y,i;|t)- 

V{y,v\t)\ < = if W{y,v\t) = 0, for all Prit) > O}. 
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T^^l^^j^ (t) implies ©(W^^H 11/|P„) < 5^ with Wn <^ W and by continuity Lemma O this 
leads to D(Wn\\Vn\Pn) < V{W\\Vn\P„) - ,/6j2\og{./6j2/{\\£r\\\\^f\\rf)). Then, it is 
easy to see that D(11/„||V^|P„) < e„ and therefore (y, v) G T^^^ j^ (t). This proves that for each 
W G Wyi and sufficiently large n, it is possible to find a conditional PD Vn{-\t) G S and a 
sequence (e„)„eN+ such that T^^|^^j^ (t) C T[^^j^ (t), which establishes (|54l) . 

Using similar arguments as above and the uniform continuity of the entropy function, it can 
be shown that there exists tiq G N+ and (^OnGN+ such that for all n > u'q and each t G 'JTli^, , 

ma^H{V^\P^) -snpH{Ye,Ve\T,6 = 6) <C (56) 

with ^ as n oo. Finally, the theorem follows by combining inequalities (|53l) with 
LemmaOand inequalities and by setting r]n = + 2|| ^|| ||^|| ||f'||r2"Mog(n+l), for all 
n > max{nQ, no}. ■ 



Appendix III 
Information Inequalities 

Given arbitrary measurable functions {fk '■ '3^ >^ ^ '^}k-i ^^'^ numbers {A^ G 



the set £ = {We{y,v\t,e) G ^(^T x r) : EE t^lt, ^) = A^, 1 < A; < 

K and t G 5^} if non-empty, is called a linear family of conditional PDs [40]. 

Theorem 3.1: For an arbitrary set of states yl C 0, let = {Wg : 5^x6) i — > ^x r}^^^ C 
X y) he a convex set of conditional PDs (or channels) with finite input, state and output 
alphabets r, ^) and let We*{y,v\t, 9) G be the channel such that Sp{We*) = Sp{Wa), 
respect to a PD qrpx\ue ^ as defined in (flOl) . Then the following inequality holds 

miI{T-Yy\V^,e)<I{T-Ye\Ve)- \v{{Ye,Ve)\\{Ye^,Ve^)\T,e) - V{{Ye,Ve)\\{Ye^,Ve^)\e) 
ag/1 L 

(57) 
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for every state 9 E A. Furthermore, if the asserted inequality holds for some 9* E A and all 
9 E A, then 9* must provide the infimum value of the mutual information over the set A, i.e., 
IJT; Ye*\V0*, 9) = inf I(T; YglVe, 9) + e with e > 0. Moreover, the inequality ([57]) is actually an 
equality if is a linear family of conditional PDs (Wyi C L). 

Proof: Let e > and Wg*{y,v\t,9) with 9* E A he the channel state that yields to 
I,{T]Yg*\Ve*,9) = ini I{T;Yx\Vx,9) + e. For arbitrary We{y,v\t,9) with state 9 E A, the 
convexity of guarantees that W^*(?/, v\t, 9) = {1 - a)We*{y, v\t, 9) + aW6{y,v\t, 9) E Wa 
for all a E [0,1]. Observe that W^"gl(?/, f |t, ^) is linear in a and /(T; Fe| V^, ^) is a convex 
function in Wg{y,v\t, 9), which implies that I{T; ^^^1 |V^''el, 9) is a convex function in a. Hence, 
the difference quotient of /(T; Yg'^l\Vg^l, 9) evaluated in a = is given by, 

AM = 0) = 7 V^T- Y^%\V^%, 9) - hiT- YeAVe*, 9)] , (58) 
with At{a = 0) > for each t E (0, 1]. Thus, there exits some i E (0,t) such that 



0< A(a = 0) = ^/(T;r;;l|<l,, 



(59) 

a=t 



While, 

Yi;l\Ve% ^) = E E E ^rioim [W,(y, v\t, ^)-W,.(y, v\t, 9)] log J^f^^'""'''^] 

(60) 

and by taking t — in expression (|59l) . we obtain 



< limA(« = 0)=limA/(T;yJ°)|\/;^)^ 



EE E ^Ti^(^l^) [W,(y, ^) - We4y,v\t, 9)] log 



'e*{y,v\t,9) 



^e*qTdy,v\9) 



= I{T;Ye\Ve) + V{{Ye,VemYg.,Ve*)\9)-V{{Ye,VemYe*,Ve*)\T,9)-h^^^^ 

(61) 
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where we have used the fact that Sp{We) C Sp(Wg*). Since expression (|6TI) is always positive 
this concludes the proof of the inequality (|57|) . In order to show the equality, observe that under 
the assumption that Wyi is a linear family of conditional PDs. For every We{y,v\t, 9) E there 
is some a < such that ^^^"^^(y, f |t, 6) = {l-a)W84y,v\t, 6) + aWe{y,v\t, 6) e Therefore, 
we must have {d/da)I{T;Y,^;l\vi;l9)l^^ = 0, i.e. 

E E E iTieim [We{y, v\t, 6) - Wo^y, v\t, 9)] log J'^^^'^'^'^i = 0, 



ver 



We*qT\eiy.v\0) 



for all We{y, v\t, 6) E £, and this proves the equality in (|52 



Appendix IV 
Evaluation of Some Indefinite Integrals 

In this Appendix we want to evaluate the following indefinite integral defined by (|45l) 

POO l'4>^+(f)E 

/(ropt,0£)=/ / Drexp(-Ar2 + Brcos(0H-0^^))drrf0j^, (62) 

A / \ 

where ropt, A, B G M+ with D = — exp f j and </)£ G [— vr, tt]. First, we use integration by 
parts and the series expansion of [58, Eq. 6.9] to obtain 

/ exp(Mrcos{(f)H-4>^l^))d(f)H = 2/ /o(r) + 2 /fc(r) cos(A;0h) d4>H, 

J<l>ij-(t'E Jo 

= 2/o(Br)0£ + 4f;/„(Br)('^^^^], (63) 

n=l ^ ^ 

where /^(x) is the k-th order modified Bessel function of the first kind [42, Eq. (8.445)] 

(-1)^ /X\'"'+2k 



^ k\r(n + k + l) \2 

k=l ^ ' 

and r{-) is the Gamma function. We now compute the remainder term given by 

j rexp (-Ar2) /„(Br)rfr = —exp j Qi,^ [^-^, V2AroptJ , (65) 



June 9, 2009 



To appear in IEEE Transactions on Information Theory 



49 



and Qi „(q;,/3) is the Nutall Q-function defined by [59] 

Qi,n(a,/5) = 



a;exp 



/3 



In{a x)dx, 



(66) 



witli non-negative reals a, (3 (see [60] for its numerical evaluation). Actually, the integral in (|62l ) 
follows from (1631) and (|65]). 



/fr, 



opt, </^£j 



1 

TT 



Qi 



, V2Aropt 0£ 



OO / 

A;=l ^ 



2Ar, 



opt 



sin(A;0£ 



(67) 



^2A' V k 

where Qi(a,/3) = Qi,i(a,/3) is the first-order Marcum Q-function [43]. This infinite sum does 
not seem to be amenable to further simplifications yielding a closed-form expression. Numerical 
simulations showed that it can be well- approximated using only two terms, i.e.. 



Jfr, 



opt, (Pe.) 



1 

vr 

+ 2Qi, 



Qi 



2Ar, 



opt VE. 



2k 



2Aropt sin(0£) 



(68) 



The evaluation of the expectation in (|4TI) is obtained by computing the following integral 

I{A,B,P) = jj^ x\og2iA + Bx)exp(^-^^dx, 



(69) 



APJ \AP^ 

where Ei{z) = f^t^^exp{—t)dt denotes the exponential integral function. 
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(a) Mean EIO capacity with perfect feedback (dashed lines), without feedback 
(dotted Unes), mean capacity based on the averaged error probabihty (dashed-dot 
Une) and mean Shannon capacity (solid line) vs. SNR, for 7 G {0.1, 0.01} and 
N = 1. 




SNR [dB] 



(b) Mean EIO capacity with perfect feedback (dashed lines), achievable EIO 
rates associated to the mismatched ML decoder (dotted lines) and mean Shannon 
capacity (solid line) vs. SNR, for 7 = 0.01 and N G {1, 3, 10}. 
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(c) Mean EIO capacity with perfect feedback (witii and witliout power allocation, 
dashed line), with rate-limited feedback Rpb G {1, 3, 8} (power allocation, dotted 
lines), ergodic capacity (solid line) and mean Shannon capacity (dashed-dot line) 
vs. SNR, for 7 = 0.01 and = 1. 



-Y—K^^'ZSdB (N=1) 
K^=0dB{N=1) 

. • _K^=-15dB (N=1) 

-m— K^.25dB (N=3) 
K^=OdB {N=3) 

. + _K =-15dB (N=3) 



K,,=-15dB 



=25dB 



5 7.5 

SNR [dB] 



(d) Mean EIO capacity for different Rice factors Kh £ { — 15,0, 25}dB and 
amounts of training A*' G {1,3} with perfect feedback vs. SNR. 
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